Skip to content

Conversation

@DiannaHohensee
Copy link
Contributor

@DiannaHohensee DiannaHohensee commented Jan 14, 2025

The DesiredBalanceReconciler is responsible for applying updates to
the cluster states that reflect shard allocation changes towards a
DesiredBalance. It isn't the Reconciler's responsibility to handle
pushing APM metrics. This patch cleans up the Reconciler constructor
and logic by extracting metric handling, modularizing metric updates
in the Allocator level of the code instead of being split across the
two components.


I want to expand upon the metric collection, so I'm yanking it out of the Reconciler. Otherwise I'll have to push more objects into the Reconciler to create all the stats there, which isn't really the Reconciler's business. Precursor to plugging in the work started over here #119916: once they're both in, I can start plug in the new stats.
This is purely a refactor now.

The DesiredBalanceReconciler applies desired balance to the routing
table cluster metadata. This patch extract logic from the reconciler
that pushes updates to the desired balance metrics. This removes
extra parameters passed into the DesiredBalanceReconciler, and
enables expanding the metrics that are collected without passing
more objects into the reconciler class.

Relates ES-10341
@DiannaHohensee DiannaHohensee added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team labels Jan 14, 2025
@DiannaHohensee DiannaHohensee self-assigned this Jan 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

long index,
RoutingAllocation routingAllocation,
List<ShardRouting> ignoredShards,
BalancingRoundStats.Builder clusterAllocationStatsBuilder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it's a bit confusing how there's a stats builder on the input, but we only use it for the eager reconciliation (assuming I've understood the code right). Could we instead just create one specifically for the eager reconciliation to make it clear it's local to that process?

Copy link
Contributor Author

@DiannaHohensee DiannaHohensee Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. What part are you referring to as eager reconciliation? Is it this balance() method?

My intention with passing around a stats data structure is to collect additional statistics throughout the balancing round -- for example, I'll want balancing round start and end times, so I'll need stats tracking as early as this line before computation begins. A balancing round begins with a DesiredBalanceInput submitted to the ContinuousComputation object, which queues requests for a balancing round: the latest DesiredBalanceInput submission will be run async.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean in allocate() we call reconcile(...) with desiredBalanceInput.clusterAllocationStatsBuilder() but then in the continuous computation we create separate StatsBuilders to pass to submitReconcileTask, even though desiredBalanceInput is in scope.
I just wonder whether the statsBuilder belongs in desiredBalanceInput? but I find the code a bit hard to follow so maybe there's an obvious reason for it being there.
I think I'm struggling to understand the lifecycle of the builder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. I think you found something buggy. So allocate will queue a DesiredBalanceInput for computation, which gets picked up asynchronously. But then reconcile() is run immediately and updates the metrics. Computation happens eventually.

I thought I was being thorough by passing the stats builder everywhere, but reconciliation always runs whereas computation occasionally runs (queues, so an input will get skipped if there's a new input). And then computation will normally lead to reconciliation being called.

Okay... So allocate() needs two separate StatsBuilders, one for immediate reconciliation, and the other passed to computation but ultimately reconciliation. Or alternatively we skip instrumenting computation and create fresh StatsBuilders for each reconciliation event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there will be some use of the StatsBuilder in the computation phase (is that what // TODO: this will be expanded upon shortly in ES-10341. refers to?)

If so, is is as simple as just

  • Removing StatsBuilder from the DesiredBalanceInput
  • Creating one locally to pass to reconcile(...) in allocate(...)
  • Using the existing local StatsBuilder in processInput

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've updated this patch to be purely refactor work. It doesn't lead into new work anymore. I think it's cleaner without pushing desired metrics objects into the Reconciler to handle: this cleans up the constructor and purpose of the Reconciler.

@pxsalehi
Copy link
Member

I find it hard to judge how useful the refactoring decisions here and in #119916 are without actually seeing what metric are going to be collected. IMO, it would good to see a draft PR (potentially w/o tests is also ok) which shows concretely what we're collecting and then work out what are reasonable pieces to separate out, if that ends up being too large to review. My experience adding some metrics here was that while trying to plug things in, you'd stumble upon different twists and wrinkles that might impact the structure. (not sure how much has been discussed, but I find it hard to asses these refactoring PRs).

@DiannaHohensee
Copy link
Contributor Author

DiannaHohensee commented Jan 14, 2025

I find it hard to judge how useful the refactoring decisions here and in #119916 are without actually seeing what metric are going to be collected.

Sure, here's a WIP PR for the summed changes. I have some notes on what to look at in that PR. Let me know what you think.

My experience adding some metrics here was that while trying to plug things in, you'd stumble upon different twists and wrinkles that might impact the structure. (not sure how much has been discussed, but I find it hard to asses these refactoring PRs).

Yes, the code is definitely tricky. It takes a lot of thinking to sort out what anything is right now.

I haven't really discussed the code implementation with anyone so far, only what I've tried to explain in the PRs. The design ticket, ES-10260, explains the high-level, in detail, to see where I'm headed.

@pxsalehi
Copy link
Member

Thanks, I had a look at the WIP PR. I'm a bit lost following the changes, probably because: 1) the PR seems to be very general even though the specific metrics are still TODO. Any chance we could start small and gradually extend it? It is hard to follow the changes. 2) There is a lot of noise that is not really a core change, e.g. adding new comments to existing code and renames, it makes it hard to focus on what the actual change is. Just two suggestions that would make reviewing for me personally easier. As for the way stats are collected, I think I agree that passing around builder like that is a bit confusing and probably we shouldn't add it to the input like that. Do the gathered metrics need to be all centralized like that?

@DiannaHohensee
Copy link
Contributor Author

I've realized that my understanding of the code was incorrect. Reconciliation only applies a handful of changes from the current desired balance computation, until it reaches throttling saturation (e.g. each node can only participate in X concurrent shard recoveries). So, as an example, the reconciler might stop assigning shards in allocateUnassigned, and still run moveShards and balance, but those methods won't do anything further.

So I'll need to instrument the computation, not reconciliation. I'm investigating if I can compare old to new desired balance here to get the stats.

I think refactoring the reconciler to pull out stats reporting still makes sense, makes things more modular, but I'll see how it looks.

@DiannaHohensee
Copy link
Contributor Author

Slowly making progress here.. I've filed ES-10581 as an extra twist.

Copy link
Contributor Author

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty, I've refactored my change to only extract the logic out of the Reconciler, keeping all the desired balance metric update logic in the Allocator.

long index,
RoutingAllocation routingAllocation,
List<ShardRouting> ignoredShards,
BalancingRoundStats.Builder clusterAllocationStatsBuilder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've updated this patch to be purely refactor work. It doesn't lead into new work anymore. I think it's cleaner without pushing desired metrics objects into the Reconciler to handle: this cleans up the constructor and purpose of the Reconciler.

@DiannaHohensee
Copy link
Contributor Author

I'm closing this -- since there hasn't been much activity and it's a bit messy from the pivot -- and replacing it with #121771.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants